Skip to content

test(#276): UTM macOS VM 検証ハーネスを整備する#277

Merged
GeneralD merged 13 commits into
mainfrom
test/#276-utm-vm-verification-harness
Jun 12, 2026
Merged

test(#276): UTM macOS VM 検証ハーネスを整備する#277
GeneralD merged 13 commits into
mainfrom
test/#276-utm-vm-verification-harness

Conversation

@GeneralD

@GeneralD GeneralD commented Jun 12, 2026

Copy link
Copy Markdown
Owner

type priority complexity diff files review

Closes #276

概要

UTM macOS VM を使って lyra の OS レベル検証(サービスライフサイクル、launchd KeepAlive、ゲスト再起動、ログ収集)を開発機を汚さずに行えるハーネスを整備した。

追加ファイル

.claude/scripts/lyra-vm-harness.sh

utmctl + SSH ベースのハーネスシェルスクリプト。サブコマンド構成:

サブコマンド 内容
boot <vm> VM 起動 + SSH 待機
shutdown <vm> ゲスト graceful shutdown
reboot <vm> ゲスト再起動 + SSH 待機
run-lyra <vm> ホストでビルド → バイナリ push → インストール → daemon 起動
capture <vm> [dir] スクリーンショット + unified log + process sample 収集
restore <vm> daemon 停止 + brew service 状態復元
exec <vm> -- <cmd> SSH 経由で任意コマンド実行
ip <vm> ゲスト IP 表示

SSH を主経路とし、utmctl はライフサイクル(start/stop/status/ip-address)専用。 utmctl exec は macOS Apple backend ゲストでは login shell をバイパスするため使わない。

.claude/rules/vm-verification.md

lyra 固有の検証レーン分類:

  • VM で確認できること / できないことの一覧
  • Dynamic Resolution は「解像度変更の近似」であり display hot-plug の代替ではないことを明記
  • display topology 変化(NSScreen 増減)は ScreenProvider fixture テストで扱い、物理 hot-plug は最終手動 smoke に残す判断基準

.claude/rules/dev-verification.md(追記)

末尾に VM レーンへのポインタを追記。視覚的な確認はホスト debug-build 手順、OS レベルの副作用は VM レーンと使い分けを明示。

設計方針(棲み分け)

配置 内容
lyra .claude/rules/ lyra 固有の判断基準・ハーネス使い方
lyra .claude/scripts/ lyra のビルド・サービス名を知るハーネス本体
グローバル ~/.config/claude/rules/utm-macos-vm.md utmctl + SSH の汎用パターン(他 macOS アプリでも使える)

lyra プロジェクトはグローバルルールに依存しない。同じ設計思想を持ちつつ自己完結している。

Summary by CodeRabbit

  • Documentation

    • Added VM verification lane docs with prerequisites, scenario-to-lane guidance, lifecycle best practices, and cleanup patterns
    • Updated Swift project verification configuration
  • New Features

    • Added a macOS VM testing harness to manage VM lifecycle, run the app in-guest, capture screenshots/logs/samples, and ensure clean restore
  • Chores

    • Bumped version to 2.13.14
  • Behavior

    • Improved YouTube download selection to prefer best video with an automatic audio/video fallback when needed

GeneralD added 2 commits June 12, 2026 14:59
.claude/rules/vm-verification.md を追加。lyra 固有の検証レーン分類
(VM で確認できること / できないこと、Dynamic Resolution の位置付け、
display hot-plug vs ScreenProvider fixture の棲み分け)を定義する。

.claude/scripts/lyra-vm-harness.sh を追加。utmctl + SSH を使って
ホストから VM のライフサイクル管理・lyra のビルド/インストール/起動・
サービス状態の保存と復元・成果物収集(スクリーンショット/ログ/プロセス
サンプル)を行うハーネスを提供する。スクリプトは自己完結しており、
ユーザーグローバルのスキルやルールに依存しない。

.claude/rules/dev-verification.md に VM レーンへの参照を追記。

設計方針: lyra 固有手順はプロジェクト .claude/、汎用 macOS VM 操作
パターンはユーザーグローバル ~/.config/claude/rules/utm-macos-vm.md に
切り出し。両者は同じ設計思想を共有するが依存しない。
Copilot AI review requested due to automatic review settings June 12, 2026 05:59
@GeneralD GeneralD self-assigned this Jun 12, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review has reached their quota limit.

@coderabbitai

coderabbitai Bot commented Jun 12, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Review limit reached

@GeneralD, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 44 minutes and 34 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more credits in the billing tab to continue.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e7561c84-1392-4f53-b449-fc5146486866

📥 Commits

Reviewing files that changed from the base of the PR and between 187d403 and 29ea18d.

📒 Files selected for processing (1)
  • .claude/scripts/lyra-vm-harness.sh
📝 Walkthrough

Walkthrough

Adds VM verification rules and docs, a host-side Bash harness to orchestrate UTM macOS VM lifecycle and run/capture/restore lyra in-guest, tweaks yt-dlp arguments for YouTube wallpaper downloads, and increments project version to 2.13.14.

Changes

VM Verification Infrastructure

Layer / File(s) Summary
VM verification lane rules and documentation
.claude/rules/dev-verification.md, .claude/rules/vm-verification.md
Repository-level rule and a VM verification page describing UTM macOS guest scenarios, prerequisites, SSH/tooling setup, harness invocation flow, environment variables, common scenario sequences, trap-based cleanup, and guidance that Dynamic Resolution approximates frame-resolution changes rather than monitor hot-plug; display topology tests should use ScreenProvider fixtures and physical hot-plug is manual.
VM harness script implementation
.claude/scripts/lyra-vm-harness.sh
Bash harness with subcommands for VM lifecycle (boot, shutdown, reboot), run/deploy (run-lyra builds host release, uploads, installs, starts daemon while persisting prior brew services state), artifact collection (capture gathers screenshot, unified logs, process sample, daemon log), state restoration (restore stops recorded daemon and conditionally restarts prior brew services state), arbitrary remote command execution (exec), and guest IP resolution (ip). Includes SSH/SCP helpers, SSH readiness polling, centralized logging, and environment-driven overrides.
YouTube downloader args
Sources/WallpaperDataSource/YouTubeWallpaperDataSourceImpl.swift
Adds --extractor-args youtube:player_client=android and changes the -f selector to bestvideo[...] / best[...] to fall back to combined A/V when video-only stream is unavailable.
Version update
Sources/VersionHandler/Resources/version.txt
Version increment from 2.13.10 to 2.13.14.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • GeneralD/lyra#262: Related changes to .claude/rules/dev-verification.md expanding developer verification workflow; both PRs touch VM-lane guidance and verification rules.

Poem

🐇 I hop the VM awake at night,
keys clutched close, build glowing bright.
Upload, start, a screenshot score—
capture logs, then tidy store.
The rabbit nods: snapshot saved, all right.

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title clearly refers to setting up a UTM macOS VM verification harness (#276), which aligns with the main changes: new harness script, VM verification rules, and developer verification guide updates.
Linked Issues check ✅ Passed All acceptance criteria from #276 are met: repo-local rules document VM verification setup, harness supports VM lifecycle operations via utmctl+SSH, enables guest build/install/run/restore, captures screenshots/logs/samples, clarifies Dynamic Resolution as resolution approximation not display topology, reserves topology testing for ScreenProvider fixtures, and uses SSH as primary path with GUI automation only as fallback.
Out of Scope Changes check ✅ Passed Version bump and yt-dlp format fallback change are supplementary to VM verification setup but slightly tangential to core harness objectives; however, both support VM testing scenarios (music playback, tool compatibility), so remain within reasonable scope boundaries.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch test/#276-utm-vm-verification-harness

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4387a70c54

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread .claude/scripts/lyra-vm-harness.sh Outdated
local ip=""
log "Waiting for $vm to be reachable via SSH (timeout: ${LYRA_VM_BOOT_TIMEOUT}s)..."
while [[ $SECONDS -lt $deadline ]]; do
ip="$(vm_ip "$vm")"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Guard the SSH wait loop against missing IPs

When boot or reboot starts polling before UTM has published an IP address, vm_ip exits non-zero because the grep | head pipeline finds no match under set -euo pipefail; this assignment is not guarded, so the harness exits immediately instead of waiting until LYRA_VM_BOOT_TIMEOUT. Make the IP lookup non-fatal in the polling loop (or in vm_ip) so newly-started guests without an address yet can still become ready.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

action

修正ip="$(vm_ip "$vm")" || true に変更し、IP 未取得時に set -e でスクリプトが終了しないよう修正しました。

Fixed in e98e904.


log "Pushing binary to guest..."
ssh_run "$ip" "mkdir -p /tmp/lyra-drop"
scp_put "$ip" "$binary" "/tmp/lyra-drop/lyra"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Copy SwiftPM resource bundles with the VM binary

run-lyra uploads only the executable, but this package has SwiftPM resources (VersionHandler and MediaRemoteDataSource) and the existing make install target copies the generated *.bundle directories next to the binary. Without those bundles on the guest, Bundle.module lookups return unknown for the version and cannot find media-remote-helper.swift, so VM track/daemon verification is not exercising the same installed app behavior.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

action

バンドルは現在のコードで既に /tmp/lyra-drop/ 経由で転送・インストール済みです。このコメントは旧版を対象にしていたため対応不要です。

Comment thread .claude/scripts/lyra-vm-harness.sh Outdated
scp_put "$ip" "$binary" "/tmp/lyra-drop/lyra"

log "Installing on guest..."
ssh_run "$ip" "sudo install -m 755 /tmp/lyra-drop/lyra /usr/local/bin/lyra"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Start the binary that was just installed

On Apple-silicon Homebrew guests, the brew install from the prerequisites lives under /opt/homebrew/bin, and the repo's own executable lookup checks that path before /usr/local/bin. Since the harness installs the PR build to /usr/local/bin/lyra but later runs plain lyra daemon, the shell can resolve the preinstalled brew binary instead of the just-pushed build, invalidating VM results for the commit under test.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

action

現在のコードではランチャースクリプトが /tmp/lyra-vm-test/lyra を絶対パスで起動しているため、brew バイナリへの干渉はありません。このコメントは旧版を対象にしていたため対応不要です。

prior_state="$(ssh_run "$ip" "cat ~/.lyra-vm-prior-service-state 2>/dev/null || printf 'none'")"
if [[ "$prior_state" == "started" ]]; then
log "Restoring lyra brew service on guest..."
ssh_run "$ip" "brew services start lyra"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Restore the overwritten brew binary before restart

On guests where Homebrew installs lyra into /usr/local/bin, run-lyra overwrites the brew-managed executable with the PR build and restore then starts the service again without reinstalling or restoring the original binary. In that setup the trap leaves the VM dirty and the supposedly restored brew service keeps running the test build after verification.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

action

修正インストール先を /usr/local/bin/ から /tmp/lyra-vm-test/ に変更し、brew 管理バイナリを上書きしないようにしました。Intel・Apple Silicon 両方で restore 後も brew バイナリが汚染されなくなります。バンドルも同ディレクトリに配置するため Bundle.module も正常に動作します。

Fixed in e98e904.

Comment thread .claude/scripts/lyra-vm-harness.sh Outdated
Comment on lines +160 to +161
pid="$(ssh_run "$ip" "cat ~/.lyra-vm-daemon.pid 2>/dev/null || printf '?'")"
log "lyra daemon running on guest (PID=$pid)"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Fail run-lyra when the daemon exits immediately

If lyra daemon fails during startup (for example because another instance still holds the flock after brew services stop failed, or because the pushed build crashes), the SSH command still writes the background PID and run-lyra reports success after the sleep without checking that the process is alive. This makes the harness proceed to health checks and captures against no PR daemon, so validate the PID after the grace period and surface the daemon log on failure.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

action

修正sleep 3 後に kill -0 $pid でデーモンの生存を確認するようにしました。起動直後にクラッシュした場合はデーモンログを出力して die するようになります。

Fixed in e98e904.

Swift ファイル作業時のみロードされるよう *.swift / Package.swift に絞る。
macOS ネイティブ挙動の検証が必要になるのは Swift を書いている時であり、
マルチプラットフォームな設定・ドキュメント作業では不要なため。

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.claude/scripts/lyra-vm-harness.sh:
- Around line 111-116: Replace the fixed sleep-based checks with polling that
verifies actual VM state transitions: in the shutdown path (where ssh_run is
used then utmctl status/stop/--kill are invoked) poll utmctl status for the
"stopped" state with a short sleep loop and a configurable timeout before
escalating to utmctl stop --kill; in the reboot path (where wait_for_ssh is
used) ensure you first detect the SSH session drop by polling until SSH fails,
then poll for a new successful SSH connection (using wait_for_ssh or the same
ssh_run probe) within a timeout to confirm the guest actually rebooted; apply
these changes around the ssh_run, utmctl, and wait_for_ssh usage so you only
force-stop or treat reboot as complete after the confirmed state transitions.
- Around line 144-156: The harness installs the built artifact to
/usr/local/bin/lyra but later starts the daemon with "lyra daemon", which can
pick up a different Homebrew-provided binary; update the detached start command
used in the ssh_run that currently contains "nohup lyra daemon >
~/.lyra-vm-daemon.log 2>&1 & printf '%s\n' \$! > ~/.lyra-vm-daemon.pid" to
invoke the exact installed binary (use /usr/local/bin/lyra) so the VM uses the
artifact just copied; keep the same nohup/stdout/stderr and PID-file behavior.
- Around line 197-199: The remote tilde should be expanded before calling
scp_get because scp in SFTP mode may not expand "~"; obtain the remote home
directory via ssh (e.g., run a remote 'printf %s "$HOME"' using the existing
$ip), store it in a variable like remote_home, then call scp_get with
"$remote_home/.lyra-vm-daemon.log" instead of the literal
"~/.lyra-vm-daemon.log"; keep the existing error handling/log calls (log and
$out_dir) and update references to scp_get, ip, out_dir, and log accordingly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 7352a26e-27b1-4948-b8a5-59cba3d153b5

📥 Commits

Reviewing files that changed from the base of the PR and between 794b48f and 3f7b1d9.

📒 Files selected for processing (4)
  • .claude/rules/dev-verification.md
  • .claude/rules/vm-verification.md
  • .claude/scripts/lyra-vm-harness.sh
  • Sources/VersionHandler/Resources/version.txt

Comment thread .claude/scripts/lyra-vm-harness.sh
Comment thread .claude/scripts/lyra-vm-harness.sh Outdated
Comment thread .claude/scripts/lyra-vm-harness.sh Outdated
GeneralD added 10 commits June 12, 2026 16:39
…rness.sh

- Add `LYRA_VM_SSH_HOST` env var to `vm_ip()` so the harness works with
  macOS Apple Virtualization Framework backend VMs where `utmctl ip-address`
  returns "Operation not supported by the backend"
- Convert `A && B || C` chains in `cmd_capture` to proper `if/then/else` (SC2015)
- Add `shellcheck disable=SC2088` with justification for tilde in remote scp path
Safari で URL を開いて AppleScript で自動再生。
MediaRemote が認識するか lyra track でそのまま確認できる。
ゲスト側にスクリプトは置かず、ホスト側から SSH で完結。
- scp_put_r ヘルパーを追加 (ディレクトリ再帰コピー)
- run-lyra: /tmp/lyra-drop を事前 sudo rm -rf してルート所有古バンドルを除去
- run-lyra: *.bundle を /usr/local/bin/ に配置 (resource bundle 必須)
- run-lyra: nohup → sudo launchctl asuser + ランチャースクリプト方式に変更
  SSH コンテキストから AppKit ウィンドウを表示するには GUI bootstrap
  namespace への inject が必要なため
- capture: screencapture も GUI session が必要 (TODO コメント追加)

実証済み: VM 上で Payphone (Maroon 5) の歌詞が全画面オーバーレイ表示
- Add explicit `pgrep -x lyra | kill` step in run-lyra before launching new daemon
  to prevent "Another lyra daemon is already running" when re-running
- Fix launcher script to use `echo "$!"` instead of `printf '%s\n' "$!"`
  (unquoted `\n` in sh was consumed by the shell → PID file contained `<pid>n`)
…YouTube 403

yt-dlp with the default web client triggers YouTube SABR streaming which
returns HTTP 403. Switching to `player_client=android` avoids SABR, but
video-only formats then require a GVS PO Token (unavailable), causing
"Requested format is not available".

Fix: add `best[ext=mp4][height<=maxHeight]` as fallback after the
video-only selector. This picks the combined A/V format (format 18,
360p MP4) when video-only streams are blocked — confirmed working in
VM verification. --no-audio is a no-op for combined formats.

fix(harness): use /tmp for daemon log+pid instead of $HOME

sudo resets HOME to /var/root, so "$HOME"/.lyra-vm-daemon.log and
.pid were written to /var/root/ and invisible to the SSH login user
(babu). Fixed to use /tmp/lyra-vm-daemon.{log,pid} which all users
can read.

fix(harness): add host-side UTM window fallback screenshot in capture

When SSH screencapture -x fails (no display in SSH context), falls back
to a Swift CGWindow lookup on the host to capture the largest UTM window.
Enables visual verification of the download indicator and UI state.
@GeneralD GeneralD merged commit e25d50d into main Jun 12, 2026
@GeneralD GeneralD deleted the test/#276-utm-vm-verification-harness branch June 12, 2026 10:16
@codecov

codecov Bot commented Jun 12, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

test: UTM macOS VM 検証ハーネスを整備する

2 participants